Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: covidwho-1704326

ABSTRACT

Protein lysine crotonylation (Kcr) is an important type of posttranslational modification that is associated with a wide range of biological processes. The identification of Kcr sites is critical to better understanding their functional mechanisms. However, the existing experimental techniques for detecting Kcr sites are cost-ineffective, to a great need for new computational methods to address this problem. We here describe Adapt-Kcr, an advanced deep learning model that utilizes adaptive embedding and is based on a convolutional neural network together with a bidirectional long short-term memory network and attention architecture. On the independent testing set, Adapt-Kcr outperformed the current state-of-the-art Kcr prediction model, with an improvement of 3.2% in accuracy and 1.9% in the area under the receiver operating characteristic curve. Compared to other Kcr models, Adapt-Kcr additionally had a more robust ability to distinguish between crotonylation and other lysine modifications. Another model (Adapt-ST) was trained to predict phosphorylation sites in SARS-CoV-2, and outperformed the equivalent state-of-the-art phosphorylation site prediction model. These results indicate that self-adaptive embedding features perform better than handcrafted features in capturing discriminative information; when used in attention architecture, this could be an effective way of identifying protein Kcr sites. Together, our Adapt framework (including learning embedding features and attention architecture) has a strong potential for prediction of other protein posttranslational modification sites.


Subject(s)
Computational Biology , Deep Learning , Lysine/metabolism , Protein Processing, Post-Translational , Software , Algorithms , Benchmarking , Computational Biology/methods , Computational Biology/standards , Databases, Factual , Neural Networks, Computer , Phosphorylation , ROC Curve , Reproducibility of Results , User-Computer Interface
2.
Viruses ; 14(2)2022 01 19.
Article in English | MEDLINE | ID: covidwho-1625191

ABSTRACT

Whole-genome sequencing of viral isolates is critical for informing transmission patterns and for the ongoing evolution of pathogens, especially during a pandemic. However, when genomes have low variability in the early stages of a pandemic, the impact of technical and/or sequencing errors increases. We quantitatively assessed inter-laboratory differences in consensus genome assemblies of 72 matched SARS-CoV-2-positive specimens sequenced at different laboratories in Sydney, Australia. Raw sequence data were assembled using two different bioinformatics pipelines in parallel, and resulting consensus genomes were compared to detect laboratory-specific differences. Matched genome sequences were predominantly concordant, with a median pairwise identity of 99.997%. Identified differences were predominantly driven by ambiguous site content. Ignoring these produced differences in only 2.3% (5/216) of pairwise comparisons, each differing by a single nucleotide. Matched samples were assigned the same Pango lineage in 98.2% (212/216) of pairwise comparisons, and were mostly assigned to the same phylogenetic clade. However, epidemiological inference based only on single nucleotide variant distances may lead to significant differences in the number of defined clusters if variant allele frequency thresholds for consensus genome generation differ between laboratories. These results underscore the need for a unified, best-practices approach to bioinformatics between laboratories working on a common outbreak problem.


Subject(s)
Computational Biology/standards , Consensus , Genome, Viral , Laboratories/standards , Public Health , SARS-CoV-2/genetics , Australia , Computational Biology/methods , Humans , Phylogeny , SARS-CoV-2/classification , Whole Genome Sequencing
3.
Nat Methods ; 18(12): 1496-1498, 2021 12.
Article in English | MEDLINE | ID: covidwho-1612200

ABSTRACT

The rapid pace of innovation in biological imaging and the diversity of its applications have prevented the establishment of a community-agreed standardized data format. We propose that complementing established open formats such as OME-TIFF and HDF5 with a next-generation file format such as Zarr will satisfy the majority of use cases in bioimaging. Critically, a common metadata format used in all these vessels can deliver truly findable, accessible, interoperable and reusable bioimaging data.


Subject(s)
Computational Biology/instrumentation , Computational Biology/standards , Metadata , Microscopy/instrumentation , Microscopy/standards , Software , Benchmarking , Computational Biology/methods , Data Compression , Databases, Factual , Information Storage and Retrieval , Internet , Microscopy/methods , Programming Languages , SARS-CoV-2
4.
Nat Genet ; 53(6): 809-816, 2021 06.
Article in English | MEDLINE | ID: covidwho-1223103

ABSTRACT

As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of 'genomic contact tracing'-that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large-and will undoubtedly grow many fold-placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach greatly improves the speed of phylogenetic placement of new samples and data visualization, making it possible to complete the placements under the constraints of real-time contact tracing. Thus, our method addresses an important need for maintaining a fully updated reference phylogeny. We make these tools available to the research community through the University of California Santa Cruz SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for SARS-CoV-2 specifically for laboratories worldwide.


Subject(s)
COVID-19/epidemiology , COVID-19/virology , Computational Biology/methods , Phylogeny , SARS-CoV-2/classification , SARS-CoV-2/genetics , Software , Algorithms , Computational Biology/standards , Databases, Genetic , Genome, Viral , Humans , Molecular Sequence Annotation , Mutation , Web Browser
6.
Nucleic Acids Res ; 49(D1): D18-D28, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-917706

ABSTRACT

The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a suite of database resources to support worldwide research activities in both academia and industry. With the explosive growth of multi-omics data, CNCB-NGDC is continually expanding, updating and enriching its core database resources through big data deposition, integration and translation. In the past year, considerable efforts have been devoted to 2019nCoVR, a newly established resource providing a global landscape of SARS-CoV-2 genomic sequences, variants, and haplotypes, as well as Aging Atlas, BrainBase, GTDB (Glycosyltransferases Database), LncExpDB, and TransCirc (Translation potential for circular RNAs). Meanwhile, a series of resources have been updated and improved, including BioProject, BioSample, GWH (Genome Warehouse), GVM (Genome Variation Map), GEN (Gene Expression Nebulas) as well as several biodiversity and plant resources. Particularly, BIG Search, a scalable, one-stop, cross-database search engine, has been significantly updated by providing easy access to a large number of internal and external biological resources from CNCB-NGDC, our partners, EBI and NCBI. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.


Subject(s)
Big Data , Computational Biology/standards , Databases, Genetic , Genomics/statistics & numerical data , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , China , Computational Biology/methods , Computational Biology/organization & administration , Computational Biology/trends , Data Mining/methods , Data Mining/statistics & numerical data , Epidemics , Genetic Variation , Genome, Viral/genetics , Genomics/methods , Genomics/organization & administration , Humans , Internet , Search Engine/methods , Search Engine/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL